perm filename RECOG.LET[1,JMC] blob sn#005233 filedate 1971-10-15 generic text, type T, neo UTF8
00010	Dr. Lawrence Roberts
00020	Advanced Research Projects Agency
00030	Alexandria, Virginia
00040	
00050	
00100	Dear Larry:
00200	
00300		I think ARPA should look into the possibilities of making use
00400	of  the efforts that Information International has put into character
00500	recognition.  As I see it, the situation is as follows:
00600	
00700		1. The character recognition problem has not been  solved  in
00800	general  even  though  readers  exist  for special fonts. This is the
00900	claim of Dan Forsyth and others at III.  I don't know that it is  so,
01000	but I believe it, and I think it should be looked into.
01100	
01200		2.  File  sizes are getting to the point where it is becoming
01300	feasible to put the world's literature into computer files, and it is
01400	worthwhile  to  do  this  large  one  shot  task.  I believe that the
01500	Defense Department will find that it has enough literature of its own
01600	to put into computer files to justify an effort to do so.
01700	
01800		3.  At  its  own  expense  III  has developed a system called
01900	GRAFIX 1 for character reading.  The system consists of a  PDP-10,  a
02000	special  Binary  Image  Processor  (BIP), and a lot of software.   At
02100	present, they can do a number of moderately impressive demonstrations
02200	of character reading.
02300	
02400		4.  III's  original  objective  in developing GRAFIX 1 was to
02500	make a lot of money by selling them.  At present,  they  seem  rather
02600	discouraged  about  this  partly  because of the general state of the
02700	economy, and, I would guess, partly because they don't have  as  good
02800	an  idea salesman as Fredkin was.  The company won't go broke if they
02900	scrap GRAFIX, because they have quite a bit of  cash,  and  the  FR80
03000	computer output microfilm system seems to be a successful product.
03100	
03200		5.  Nevertheless,  there  seems  to  me  to  be a substantial
03300	probability that the effort so far put into character recognition  in
03400	this  project  will  be lost just as the possibility of using it on a
03500	large scale to get the scientific literature into computer  files  is
03600	becoming a real possibility.
03700	
03800		6.  My  former student Takasayu Ito, who now heads a group in
03900	Mitsubishi is talking to III about buying the whole project  at  what
04000	amounts  to  used  computer  prices,  i.e.  500K.   Fenaughty  may be
04100	inclined  to  sell  it  to  him  although  he  is  looking  into  the
04200	possibility of finding a Japanese buyer at a higher price.
04300	
04400		There are several things ARPA could do about the situation:
04500	
04600		1.  The most straightforward option is to give III a research
04700	contract at about $220K per year which  would  pay  for  the  present
04800	level of effort without amortizing any of their expenses.  When the
04900	
05000		2. An ARPA contractor could be encouraged to buy a machine or
05100	buy out the project.  This would cost more since the machine contains
05200	a  PDP-10  and  would  probably  result  in  dissipating  the present
05300	research group.  It also offers  the  difficulty  that  none  of  the
05400	present ARPA contractors are in that line of research.
05500	
05600		3.  Arrange  for them to get production contracts for putting
05700	documents into computers of  sufficient  size  to  keep  them  going.
05800	There  will  be more motivation to do this after the terabit file and
05900	some of its brothers are working, but work  done  in  this  direction
06000	will not be lost if the material converted has permanent value.
06100	
06200		The first alternative seems to me to be  the  best,  but  the
06300	others are tolerable.  If the second is chosen, it would be better if
06400	the contractor in question were someone other than Stanford  since  I
06500	am on the board of directors of III.
06600	
06700	
06800						Sincerely yours,
06900	
07000	
07100						John McCarthy